Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too
نویسندگان
چکیده
We present Tightly Packed Tries (TPTs), a compact implementation of read-only, compressed trie structures with fast on-demand paging and short load times. We demonstrate the benefits of TPTs for storing n-gram back-off language models and phrase tables for statistical machine translation. Encoded as TPTs, these databases require less space than flat text file representations of the same data compressed with the gzip utility. At the same time, they can be mapped into memory quickly and be searched directly in time linear in the length of the key, without the need to decompress the entire file. The overhead for local decompression during search is marginal.
منابع مشابه
روشی کارا برای کاوش مجموعه اقلام پرتکرار در تحلیل دادههای سبد خرید
Discovery of hidden and valuable knowledge from large data warehouses is an important research area and has attracted the attention of many researchers in recent years. Most of Association Rule Mining (ARM) algorithms start by searching for frequent itemsets by scanning the whole database repeatedly and enumerating the occurrences of each candidate itemset. In data mining problems, the size of ...
متن کاملParallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach
There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...
متن کاملA Fast k-Neighborhood Algorithm for Large Point-Clouds
Algorithms that use point-cloud models make heavy use of the neighborhoods of the points. These neighborhoods are used to compute the surface normals for each point, mollification, and noise removal. All of these primitive operations require the seemingly repetitive process of finding the k nearest neighbors of each point. These algorithms are primarily designed to run in main memory. However, ...
متن کاملA fast all nearest neighbor algorithm for applications involving large point-clouds
Algorithms that use point-cloud models make heavy use of the neighborhoods of the points. These neighborhoods are used to compute the surface normals for each point, mollification, and noise removal. All of these primitive operations require the seemingly repetitive process of finding the k nearest neighbors (kNNs) of each point. These algorithms are primarily designed to run in main memory. Ho...
متن کاملنقش رشد و پوسیدگی ریشه گیاه ذرت در انتقال باکتری اشریشیاکلی در خاک تحت شرایط جریان اشباع
Macrospore created by decaying plant root provides pathways for rapid transport of pollutants in soil profile. The main objective of this study was quantitative analysis of the effect of plant root (Zea mays L.) on bacterial and chloride transport through soil. Experiments were conducted in 9 soil columns packed uniformly with loamy sand. The treatments were bare soil, bare soil with corn (Zea ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009